AITopics | student code

Collaborating Authors

student code

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

f29a179746902e331572c483c45e5086-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-15-2026, 01:38:36 GMT

attribution, localization, reviewer, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

Evaluating Generative AI for CS1 Code Grading: Direct vs Reverse Methods

Memon, Ahmad, Mohamed, Abdallah

arXiv.org Artificial IntelligenceNov-20-2025

Manual grading of programming assignments in introductory computer science courses can be time-consuming and prone to inconsistencies. While unit testing is commonly used for automatic evaluation, it typically follows a binary pass/fail model and does not give partial marks. Recent advances in large language models (LLMs) offer the potential for automated, scalable, and more objective grading. This paper compares two AI-based grading techniques: \textit{Direct}, where the AI model applies a rubric directly to student code, and \textit{Reverse} (a newly proposed approach), where the AI first fixes errors, then deduces a grade based on the nature and number of fixes. Each method was evaluated on both the instructor's original grading scale and a tenfold expanded scale to assess the impact of range on AI grading accuracy. To assess their effectiveness, AI-assigned scores were evaluated against human tutor evaluations on a range of coding problems and error types. Initial findings suggest that while the Direct approach is faster and straightforward, the Reverse technique often provides a more fine-grained assessment by focusing on correction effort. Both methods require careful prompt engineering, particularly for allocating partial credit and handling logic errors. To further test consistency, we also used synthetic student code generated using Gemini Flash 2.0, which allowed us to evaluate AI graders on a wider range of controlled error types and difficulty levels. We discuss the strengths and limitations of each approach, practical considerations for prompt design, and future directions for hybrid human-AI grading systems that aim to improve consistency, efficiency, and fairness in CS courses.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.14798

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)

Add feedback

Pattern-based Knowledge Component Extraction from Student Code Using Representation Learning

Hoq, Muntasir, Pitts, Griffin, Lan, Andrew, Brusilovsky, Peter, Akram, Bita

arXiv.org Artificial IntelligenceOct-15-2025

Effective personalized learning in computer science education depends on accurately modeling what students know and what they need to learn. While Knowledge Components (KCs) provide a foundation for such modeling, automated KC extraction from student code is inherently challenging due to insufficient explainability of discovered KCs and the open-endedness of programming problems with significant structural variability across student solutions and complex interactions among programming concepts. In this work, we propose a novel, explainable framework for automated KC discovery through pattern-based KCs: recurring structural patterns within student code that capture the specific programming patterns and language constructs that students must master. Toward this, we train a Variational Autoencoder to generate important representative patterns from student code guided by an explainable, attention-based code representation model that identifies important correct and incorrect pattern implementations from student code. These patterns are then clustered to form pattern-based KCs. We evaluate our KCs using two well-established methods informed by Cognitive Science: learning curve analysis and Deep Knowledge Tracing (DKT). Experimental results demonstrate meaningful learning trajectories and significant improvements in DKT predictive performance over traditional KT methods. This work advances knowledge modeling in CS education by providing an automated, scalable, and explainable framework for identifying granular code patterns and algorithmic constructs, essential for student learning.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.09281

Country:

Europe (0.93)
North America > United States > Massachusetts (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

Add feedback

McMining: Automated Discovery of Misconceptions in Student Code

Al-Hossami, Erfan, Bunescu, Razvan

arXiv.org Artificial IntelligenceOct-13-2025

When learning to code, students often develop misconceptions about various programming language concepts. These can not only lead to bugs or inefficient code, but also slow down the learning of related concepts. In this paper, we introduce McMining, the task of mining programming misconceptions from samples of code from a student. To enable the training and evaluation of McMining systems, we develop an extensible benchmark dataset of misconceptions together with a large set of code samples where these misconceptions are manifested. We then introduce two LLM-based McMiner approaches and through extensive evaluations show that models from the Gemini, Claude, and GPT families are effective at discovering misconceptions in student code.

large language model, machine learning, programming language, (20 more...)

arXiv.org Artificial Intelligence

2510.08827

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Software > Programming Languages (0.89)

Add feedback

f29a179746902e331572c483c45e5086-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 09:02:21 GMT

attribution, identifier, reviewer, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle

Miroyan, Mihran, Niousha, Rose, Gonzalez, Joseph E., Ranade, Gireeja, Norouzi, Narges

arXiv.org Artificial IntelligenceJul-21-2025

Large Language Models (LLMs) have shown strong performance on programming tasks, but can they generate student-like code like real students - imperfect, iterative, and stylistically diverse? We present ParaStudent, a systematic study of LLM-based "student-like" code generation in an introductory programming course setting. Using a dataset of timestamped student submissions across multiple semesters, we design low- and high-resolution experiments to model student progress and evaluate code outputs along semantic, functional, and stylistic dimensions. Our results show that fine-tuning significantly improves alignment with real student trajectories and captures error patterns, incremental improvements, and stylistic variations more faithfully. This study shows that modeling realistic student code requires capturing learning dynamics through context-aware generation, temporal modeling, and multi-dimensional evaluation. Code for experiments and evaluation is available at https://github.com/mmiroyan/ParaStudent.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.12674

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Technology (0.46)
Education > Curriculum (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rubric Is All You Need: Enhancing LLM-based Code Evaluation With Question-Specific Rubrics

Pathak, Aditya, Gandhi, Rachit, Uttam, Vaibhav, Devansh, null, Nakka, Yashwanth, Jindal, Aaryan Raj, Ghosh, Pratyush, Ramamoorthy, Arnav, Verma, Shreyash, Mittal, Aditya, Ased, Aashna, Khatri, Chirag, Challa, Jagat Sesh, Kumar, Dhruv

arXiv.org Artificial IntelligenceMar-31-2025

Since the disruption in LLM technology brought about by the release of GPT-3 and ChatGPT, LLMs have shown remarkable promise in programming-related tasks. While code generation remains a popular field of research, code evaluation using LLMs remains a problem with no conclusive solution. In this paper, we focus on LLM-based code evaluation and attempt to fill in the existing gaps. We propose multi-agentic novel approaches using question-specific rubrics tailored to the problem statement, arguing that these perform better for logical assessment than the existing approaches that use question-agnostic rubrics. To address the lack of suitable evaluation datasets, we introduce two datasets: a Data Structures and Algorithms dataset containing 150 student submissions from a popular Data Structures and Algorithms practice website, and an Object Oriented Programming dataset comprising 80 student submissions from undergraduate computer science courses. In addition to using standard metrics (Spearman Correlation, Cohen's Kappa), we additionally propose a new metric called as Leniency, which quantifies evaluation strictness relative to expert assessment. Our comprehensive analysis demonstrates that question-specific rubrics significantly enhance logical assessment of code in educational settings, providing better feedback aligned with instructional goals beyond mere syntactic correctness.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2503.23989

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.05)
Asia > India (0.05)
North America > United States > New York > New York County > New York City (0.05)
(8 more...)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.93)
Research Report > Promising Solution (0.66)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Assessment & Standards (0.93)
Education > Educational Setting (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Multimodal Programming in Computer Science with Interactive Assistance Powered by Large Language Model

Gupta, Rajan Das, Hosain, Md. Tanzib, Mridha, M. F., Ahmed, Salah Uddin

arXiv.org Artificial IntelligenceMar-12-2025

LLM chatbot interfaces allow students to get instant, interactive assistance with homework, but doing so carelessly may not advance educational objectives. In this study, an interactive homework help system based on DeepSeek R1 is developed and first implemented for students enrolled in a large computer science beginning programming course. In addition to an assist button in a well-known code editor, our assistant also has a feedback option in our command-line automatic evaluator. It wraps student work in a personalized prompt that advances our educational objectives without offering answers straight away. We have discovered that our assistant can recognize students' conceptual difficulties and provide ideas, plans, and template code in pedagogically appropriate ways. However, among other mistakes, it occasionally incorrectly labels the correct student code as incorrect or encourages students to use correct-but-lesson-inappropriate approaches, which can lead to long and frustrating journeys for the students. After discussing many development and deployment issues, we provide our conclusions and future actions.

language model, proceedings, student, (16 more...)

arXiv.org Artificial Intelligence

2503.06552

Country:

Europe > Norway (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.68)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.47)
Education > Educational Setting (0.46)
Education > Instructional Theory > Educational Objectives (0.44)
Education > Curriculum > Subject-Specific Education (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Test Case-Informed Knowledge Tracing for Open-ended Coding Tasks

Duan, Zhangqi, Fernandez, Nigel, Hicks, Alexander, Lan, Andrew

arXiv.org Artificial IntelligenceDec-20-2024

Open-ended coding tasks, which ask students to construct programs according to certain specifications, are common in computer science education. Student modeling can be challenging since their open-ended nature means that student code can be diverse. Traditional knowledge tracing (KT) models that only analyze response correctness may not fully capture nuances in student knowledge from student code. In this paper, we introduce Test case-Informed Knowledge Tracing for Open-ended Coding (TIKTOC), a framework to simultaneously analyze and predict both open-ended student code and whether the code passes each test case. We augment the existing CodeWorkout dataset with the test cases used for a subset of the open-ended coding questions, and propose a multi-task learning KT method to simultaneously analyze and predict 1) whether a student's code submission passes each test case and 2) the student's open-ended code, using a large language model as the backbone. We quantitatively show that these methods outperform existing KT methods for coding that only use the overall score a code submission receives. We also qualitatively demonstrate how test case information, combined with open-ended code, helps us gain fine-grained insights into student knowledge.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.10829

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Virginia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Technology > Educational Software (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Add feedback

Fine-tuning Smaller Language Models for Question Answering over Financial Documents

Phogat, Karmvir Singh, Puranam, Sai Akhil, Dasaratha, Sridhar, Harsha, Chetan, Ramakrishna, Shashishekar

arXiv.org Artificial IntelligenceAug-22-2024

Recent research has shown that smaller language models can acquire substantial reasoning abilities when fine-tuned with reasoning exemplars crafted by a significantly larger teacher model. We explore this paradigm for the financial domain, focusing on the challenge of answering questions that require multi-hop numerical reasoning over financial texts. We assess the performance of several smaller models that have been fine-tuned to generate programs that encode the required financial reasoning and calculations. Our findings demonstrate that these fine-tuned smaller models approach the performance of the teacher model. To provide a granular analysis of model performance, we propose an approach to investigate the specific student model capabilities that are enhanced by fine-tuning. Our empirical analysis indicates that fine-tuning refines the student models ability to express and apply the required financial concepts along with adapting the entity extraction for the specific data format. In addition, we hypothesize and demonstrate that comparable financial reasoning capability can be induced using relatively smaller datasets.

accuracy, dataset, fine-tuning, (14 more...)

arXiv.org Artificial Intelligence

2408.12337

Country:

North America > United States > Rhode Island > Providence County > Smithfield (0.04)
Oceania > Australia (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback